-
Notifications
You must be signed in to change notification settings - Fork 31
Datasets, parsers and benchmarks #733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ThomSerg
wants to merge
48
commits into
master
Choose a base branch
from
benchmark_datasets
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
With all of the work on datasets / benchmarking I've been doing lately, I thought of cleaning up the code and creating some reusable structures to limit code duplication and ease the creation of new datasets / benchmarks.
This pull request is definitely not in a state to be merged, but opening it anyway as a starting point for discussion / to receive feedback. So most things in here are just ideas, to start the conversation on more generic datasets / parsers / (competition) benchmarks. I'm not at all "attached" to any of the code. Just tried to get something to work first, now we can start discussion how it should be done "properly".
cpmpy.tools.dataset
A new dataset module is introduced (
cpmpy.tools.dataset) as a central place to collect ... datasets. I could have placed the code directly in here, but as discussed multiple times internally, we have multiple different concepts of datasets. Below a sketch:We have
Due to this distinction, I also put the "model"-datasets inside a "model" subdirectory.
3 "model" datasets have been added:
(I have a version of PSPLIB, but this one actually belongs to the "problem"-dataset category.)
Each dataset subclasses the generic
_Dataset, which implements logic that should be shared across all datasets and which provides dataset-specific methods to be overwritten. Mostly, each dataset defines its arguments for the constructor (e.g. year, track, ...) and adownloadmethod. So quite easy to add new datasets.Parsers
For each of the datasets, respective parsers have been added to the tools:
You'll notice some differences in the names, due to data formats being more generic than datasets, e.g. MSE is formulated in the more generic WCNF format.
cpmpt.tools.benchmark
Whilst running experiments, I've collected many variants of the XCSP3 benchmark runner adapted to the other datasets. So I thought, why not do the same exercise here? This one is still the most uncertain on how it should best be done.
So next to data formats and datasets, we also have "formalised" benchmarks. They're decoupled from both the parser and the dataset. For example, the PB competition. It defines an input format, an output format, and rules on how to "behave" (e.g. how to handle a sigterm). The WCNF parser covers the input part, so that one gets reused. The OPB dataset covers instances to test on, but any other dataset in the WCNF format can also be used within the rules of the PB competition. All the other competition rules get captured in this new "benchmark" object. I again provided a more generic
Benchmarkto be subclassed, but in this case it is also usable on its own:Simply provide a
callableparser and a path to an instance, and the model will be create and solved with the niceties of us handling everything (memlimits, timeouts, printing, capturing results, ...). Many more arguments are available (like with the xcsp3 competition):But to follow the PB-competition specific rules, we also have the pre-made
OPBBenchmarksubclass which customisesBenchmarkto the rules of the PB-Competition. Since a lot of the Benchmark's behavior has been compartmentalized into different methods, any subclass can easily overwrite these to customise according to the competition rules (e.g. how to format the result, how to report on intermediate results, how to handle sigterms, ...). This subclassing ofBenchmarkallows for the creation of many competition runners with as little duplicate code as possible.That's about it. A lot of code that should probably become separate pull requests after we figure out what to do with it.
(tools.xcsp3 still contains a lot of code from before I attempted to bring things together, e.g. its still has its own dataset / benchmark runner)